Clusterability Detection and Cluster Initialization

نویسندگان

  • Scott Epter
  • Mukkai Krishnamoorthy
  • Mohammed J. Zaki
چکیده

The need for a preliminary assessment of the clustering tendency or clusterability of massive data sets is known. A good clusterability detection method should serve to influence a decision as to whether to cluster at all, as well as provide useful seed input to a chosen clustering algorithm. We present a framework for the definition of the clusterability of a data set from a distance-based perspective. We discuss a graph-based system for detecting clusterability and generating seed information including an estimate of the value of k – the number of clusters in the data set, an input parameter to many distancebased clustering methods. The output of our method is tunable to accommodate a wide variety of clustering methods. We have conducted a number of experiments using our methodology with stock market data and with the well-known BIRCH data sets, in two as well as higher dimensions. Based on our experiments and results we find that our methodology can serve as the basis for much future work in this area. We report our results and discuss promising future directions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clusterability Detection and Initial Seed Selection in Large Data Sets

The need for a preliminary assessment of the clustering tendency or clusterability of massive data sets is known. A good clusterability detection method should serve to in uence a decision as to whether to cluster at all, as well as provide useful seed input to a chosen clustering algorithm. We present a framework for the de nition of the clusterability of a data set from a distance-based persp...

متن کامل

Clustering Oligarchies

We investigate the extent to which clustering algorithms are robust to the addition of a small, potentially adversarial, set of points. Our analysis reveals radical differences in the robustness of popular clustering methods. k-means and several related techniques are robust when data is clusterable, and we provide a quantitative analysis capturing the precise relationship between clusterabilit...

متن کامل

Which Data Sets are ‘Clusterable’? – A Theoretical Study of Clusterability

We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in th...

متن کامل

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them...

متن کامل

Clusterability: A Theoretical Study

We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000